11 research outputs found

    14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

    Full text link
    Chemistry and materials science are complex. Recently, there have been great successes in addressing this complexity using data-driven or computational techniques. Yet, the necessity of input structured in very specific forms and the fact that there is an ever-growing number of tools creates usability and accessibility challenges. Coupled with the reality that much data in these disciplines is unstructured, the effectiveness of these tools is limited. Motivated by recent works that indicated that large language models (LLMs) might help address some of these issues, we organized a hackathon event on the applications of LLMs in chemistry, materials science, and beyond. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines

    [title]

    No full text

    A theoretical analysis of single molecule protein sequencing via weak binding spectra.

    No full text
    We propose and theoretically study an approach to massively parallel single molecule peptide sequencing, based on single molecule measurement of the kinetics of probe binding (Havranek, et al., 2013) to the N-termini of immobilized peptides. Unlike previous proposals, this method is robust to both weak and non-specific probe-target affinities, which we demonstrate by applying the method to a range of randomized affinity matrices consisting of relatively low-quality binders. This suggests a novel principle for proteomic measurement whereby highly non-optimized sets of low-affinity binders could be applicable for protein sequencing, thus shifting the burden of amino acid identification from biomolecular design to readout. Measurement of probe occupancy times, or of time-averaged fluorescence, should allow high-accuracy determination of N-terminal amino acid identity for realistic probe sets. The time-averaged fluorescence method scales well to weakly-binding probes with dissociation constants of tens or hundreds of micromolar, and bypasses photobleaching limitations associated with other fluorescence-based approaches to protein sequencing. We argue that this method could lead to an approach with single amino acid resolution and the ability to distinguish many canonical and modified amino acids, even using highly non-optimized probe sets. This readout method should expand the design space for single molecule peptide sequencing by removing constraints on the properties of the fluorescent binding probes

    RNA timestamps identify the age of single molecules in RNA sequencing

    No full text
    © 2020, The Author(s), under exclusive licence to Springer Nature America, Inc. Current approaches to single-cell RNA sequencing (RNA-seq) provide only limited information about the dynamics of gene expression. Here we present RNA timestamps, a method for inferring the age of individual RNAs in RNA-seq data by exploiting RNA editing. To introduce timestamps, we tag RNA with a reporter motif consisting of multiple MS2 binding sites that recruit the adenosine deaminase ADAR2 fused to an MS2 capsid protein. ADAR2 binding to tagged RNA causes A-to-I edits to accumulate over time, allowing the age of the RNA to be inferred with hour-scale accuracy. By combining observations of multiple timestamped RNAs driven by the same promoter, we can determine when the promoter was active. We demonstrate that the system can infer the presence and timing of multiple past transcriptional events. Finally, we apply the method to cluster single cells according to the timing of past transcriptional activity. RNA timestamps will allow the incorporation of temporal information into RNA-seq workflows

    HyPR-seq: Single-cell quantification of chosen RNAs via hybridization and sequencing of DNA probes

    No full text
    © 2020 National Academy of Sciences. All rights reserved. Single-cell quantification of RNAs is important for understanding cellular heterogeneity and gene regulation, yet current approaches suffer from low sensitivity for individual transcripts, limiting their utility for many applications. Here we present Hybridization of Probes to RNA for sequencing (HyPR-seq), a method to sensitively quantify the expression of hundreds of chosen genes in single cells. HyPR-seq involves hybridizing DNA probes to RNA, distributing cells into nanoliter droplets, amplifying the probes with PCR, and sequencing the amplicons to quantify the expression of chosen genes. HyPR-seq achieves high sensitivity for individual transcripts, detects nonpolyadenylated and low-abundance transcripts, and can profile more than 100,000 single cells. We demonstrate how HyPR-seq can profile the effects of CRISPR perturbations in pooled screens, detect time-resolved changes in gene expression via measurements of gene introns, and detect rare transcripts and quantify cell-type frequencies in tissue using low-abundance marker genes. By directing sequencing power to genes of interest and sensitively quantifying individual transcripts, HyPR-seq reduces costs by up to 100-fold compared to whole-transcriptome single-cell RNA-sequencing, making HyPR-seq a powerful method for targeted RNA profiling in single cells
    corecore